计算机系统：程序员视角（全球版）：执行之路：理解编译器驱动程序

指挥者：编译器驱动程序

将 编译器驱动程序 (如 GCC) 视为一位伟大的指挥家。它自动完成从可读源代码到二进制可执行文件的复杂转换。这一旅程，即 执行之路，始于 编译时 并延伸至 加载时 和 运行时。

通过使用 独立编译，驱动程序会分别处理 main.c 和 sum.c 。一个模块的更改无需重新翻译整个项目——只需将修改后的文件经过预处理器（cpp），编译器（cc1），汇编器（as），然后由 链接器 （ld）合并生成的 可重定位目标文件。

效率与内存层次结构

链接器对 grid[0][0] 或 src[0][0] 直接影响 吞吐量 和延迟。通过将数据对齐到一个 32 字节缓存行，驱动程序促进了 步长为1的访问模式，最大限度减少 冷缺失 并避免 按列扫描导致的缓存行驱逐。在高级高性能代码中， 展开循环并行性（$4 \times 4$ 展开循环） 进一步隐藏 主存到缓存的映射 延迟，通过优化时钟频率周期（0x32, 0x1, 0x4, 0x51）实现。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Which component of the compiler driver is responsible for generating the assembly file (/tmp/main.s)?

The preprocessor (cpp)

The compiler (cc1)

The assembler (as)

The linker (ld)

QUESTION 2

What is a primary benefit of 'Separate Compilation'?

It makes the final executable run faster.

It allows modifications to one file without re-translating others.

It automatically unrolls all loops to 4x4.

It eliminates the need for a linker.

QUESTION 3

How does a Stride-1 reference pattern affect the L1 cache?

It causes column-wise scan evictions.

It maximizes hit rates by utilizing spatial locality.

It bypasses the cache to reduce latency.

It increases the number of cold misses to 100%.

QUESTION 4

What happens at 0x064C if the linker places a multi-byte integer across a 32-byte cache boundary?

The compiler driver automatically fixes it at run time.

The L1 cache throughput is maximized.

A potential drop in hit rates and increased latency occurs.

The assembler produces a relocatable error.

QUESTION 5

The hex representations 0x32, 0x1, 0x4, and 0x51 in the theory likely represent:

The binary tags for the L2 cache.

Clock frequency stalls or memory fetch latencies.

The sequence of registers used in a 4x4 unroll.

The static library identifiers.